Model Selection

High-precision OCR

# High-precision OCR

En PP OCRv4 Mobile Rec

An ultra-lightweight English text line recognition model developed by the PaddleOCR team, supporting the recognition of English and numeric characters

Text Recognition Supports Multiple Languages

SLANeXt_wired is a deep learning model for table structure recognition, which can convert non - editable table images into editable table formats (such as HTML).

Text Recognition Supports Multiple Languages

PP OCRv5 Server Det

PP-OCRv5_server_det is the latest generation of text detection model developed by the PaddleOCR team. It is designed for high-performance application scenarios and supports the detection of text in various scenarios, including handwritten, vertical, rotated, and curved text. It can recognize multiple languages.

Text Recognition Supports Multiple Languages

Llama 3.1 Nemotron Nano VL 8B V1

Llama-3.1-Nemotron-Nano-VL-8B-V1 is an advanced document intelligent vision-language model that can query and summarize images and videos, and supports multi-environment deployment.

Sapnous-6B is an advanced vision-language model that enhances perception and understanding of the world through powerful multimodal capabilities.

Transformers English

Aya Vision 32B is an open-weight 32B parameter multimodal model developed by Cohere Labs, supporting vision-language tasks in 23 languages.

Transformers Supports Multiple Languages

Typhoon2 Qwen2vl 7b Vision Instruct

Typhoon2-Vision is a Thai-supported visual language model capable of processing image and video inputs, specifically optimized for image-based applications.

Transformers Supports Multiple Languages

Paligemma2 3b Mix 224

PaliGemma 2 is an upgraded vision-language model developed by Google, combining the capabilities of Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.

TF-ID is a series of object detection models specifically designed to extract tables and figures along with their caption texts from academic papers.

TF-ID is a visual object detection model specifically designed for extracting tables and charts from academic papers, fine-tuned based on Florence-2

Object Detection

Pix2text Mfr Quantized

Pix2Text's Mathematical Formula Recognition (MFR) model, trained based on the TrOCR architecture, can convert mathematical formula images into LaTeX text representations.

Text Recognition

Pix2Text's Mathematical Formula Detection (MFD) model for recognizing mathematical formulas in images

Text Recognition Other

Sparrow is a document data extraction model fine-tuned on invoice data based on the Donut ML foundation model, designed to validate Donut's performance on enterprise documents.

Transformers English

Sparrow is a document data extraction tool fine-tuned on invoice data based on the Donut ML foundation model, designed to validate Donut's performance on enterprise documents.

Transformers English

This model is an image-to-text model based on the Apache-2.0 license, capable of converting image content into textual descriptions.

Text Recognition

Output LayoutLMv3 V7

A document understanding model fine-tuned based on microsoft/layoutlmv3-base, excelling in document layout analysis tasks

Text Recognition

MiniCPM-V 2.0 is a powerful multimodal large language model designed for efficient terminal deployment, built upon SigLip-400M and MiniCPM-2.4B and connected via a perceptual resampler.

Transformers Supports Multiple Languages

Trocr Base Plate Number

A vision model for recognizing vehicle license plates, capable of extracting plate numbers from images.

Text Recognition

Pix2Text's Mathematical Formula Recognition (MFR) model, trained based on the TrOCR architecture, capable of converting mathematical formula images into LaTeX text representations.

Text Recognition

Trocr Base Printed License Plates Ocr Timestamp

An OCR model fine-tuned based on microsoft/trocr-base-printed, specifically designed for recognizing license plates and timestamp information

Text Recognition

Nougat For Formula

A fine-tuned mathematical formula recognition model based on Nougat-small, excelling in extracting LaTeX formula code from images

CORD-v2 is a model for image-to-text tasks, primarily used for extracting and recognizing text content from images.

Text Recognition

This model is outdated. It is recommended to use the official Nougat model. Nougat is an advanced vision-language model focused on document understanding and analysis.

An OCR model specifically designed for transcribing e13b MICR codes, fine-tuned based on Microsoft's TrOCR-large-stage1.

Text Recognition

Transformers English

Pix2struct Tiny Random

This is an image-to-text model based on the MIT license, capable of converting image content into descriptive text.

General Image Captioning

This is an image-to-text model based on the Apache-2.0 license, capable of converting image content into textual descriptions.

Text Recognition

Transformers Other

A model fine-tuned based on naver-clova-ix/donut-base, specific uses and functions require more information

Layoutlmv3 Finetuned DocLayNet

A document layout analysis model fine-tuned based on the LayoutLMv3 architecture, specifically designed for document element classification tasks in the DocLayNet dataset.

Text Recognition

Transformers English

Invoices Donut Model V1

Sparrow is a document data extraction model fine-tuned on invoice data based on the Donut ML foundation model, aimed at validating Donut's performance on enterprise documents.

Transformers English

Mscoco Finetuned CoCa ViT L 14 Laion2b S13b B90k

This is an image-to-text model based on the MIT license, capable of converting image content into textual descriptions.

This is a Donut model fine-tuned on the CORD-v2 dataset, designed for image-to-text tasks, achieving an average accuracy of 0.901.

Layoutlmv3 Finetuned Funsd

A document understanding model fine-tuned on the nielsr/funsd-layoutlmv3 dataset based on microsoft/layoutlmv3-base

Text Recognition

A model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated

OCR LayoutLMv3 Invoice

An invoice recognition model fine-tuned based on LayoutLMv3-base, trained on the wild_receipt dataset, excelling in extracting structured information from invoices.

Sequence Labeling

Trocr Large Str

TrOCR is a Transformer-based optical character recognition model designed for single-line text images, fine-tuned on multiple standard datasets.

Text Recognition

Layoutlmv3 Finetuned Invoice

A fine-tuned invoice information extraction model based on LayoutLMv3-base on the SROIE dataset, excelling in token classification tasks

Text Recognition

Layoutlmv3 Finetuned Wildreceipt

A version fine-tuned on the WildReceipt dataset based on the LayoutLMv3-base model, designed for receipt key information extraction tasks

Text Recognition

Theivaprakasham

Layoutlmv3 Finetuned Invoice

An invoice information extraction model fine-tuned based on the LayoutLMv3 architecture, demonstrating outstanding performance on the SROIE dataset

Text Recognition

Layoutlmv3 Finetuned Sroie

A document understanding model fine-tuned on the SROIE dataset based on Microsoft's LayoutLMv3-base model, excelling in extracting structured information from scanned documents

Text Recognition

Theivaprakasham

Layoutlmv3 Finetuned Invoice

A version of LayoutLMv3-base fine-tuned on an invoice dataset for invoice information extraction

Text Recognition

Theivaprakasham

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase